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Abstract 

We present a dynamic data structure that keeps track of an acyclic hypergraph (equivalently, a tri- 
angulated graph) and enables verifying that adding a candidate hyperedge (clique) will not break the 
acyclicity of the augmented hypergraph. This is a generalization of the use of Tarjan's Union-Find data 
structure for maintaining acyclicity when augmenting forests, and the amortized time per operation has 
a similar almost-constant dependence on the size of the hypergraph. Such a data structure is useful when 
augmenting acyclic hypergraphs, e.g. in order to greedily construct a high-weight acyclic hypergraph. In 
designing this data structure, we introduce a hierarchical decomposition of acyclic hypergraphs that aid 
in understanding hyper-connectivity, and introduce a novel concept of a hypercycle which is excluded 
from acyclic hypergraphs. 

1 Introduction 

Acyclic hypergraphs, or hyperforests (such as the one in Figure 1(a)), are a natural generalization of forests. 
They have been independently, and equivalently, defined in many different domains, and are also studied 
as triangulated graphs (hyperforests are those hypergraphs formed by the cliques of triangulated graphs). 
Hyperforests are useful in many domains where higher-order relations are to be captured, but certain tree- 
like "acyclic" properties are desired. 




Figure 1 : An example of a 2-hyperforest. 

Acyclicity can allow many calculations to be carried out efficiently using dynamic programming. Such 
calculations include a broad class of combinatorial problems [Cou90] as well as inference in graphical 
models [Bes74]. In such applications the computation is often exponential in the width of the hypergraph, 
which corresponds to the maximum size of the hyperedges (or cliques in a triangulated graph). The class of 
K -hyperforests (width at most K) is then of particular interest. 

When a if-hyperforest is necessary, one often wishes to choose the best possible if -hyperforest, where 
the quality of a hyperforest is captured as the sum of precomputed weights over its hyperedges, leading to 
the problem of finding a maximum- weight if-hyperforest [KS01]. Such a procedure is common in several 
different domains, for the special case where K = 1 and one seeks a maximum-weight tree: maximum 



likelihood Markov trees, known as Chow-Liu trees [CL68]; Hunter- Worsley trees for Bonferroni inequal- 
ities [Wor82]; and when trees are used to ensure efficient combinatorial optimization e.g. [Mat99]. Gen- 
eralizations to higher width hyperforests are possible, and desirable, and have recently been investigated 
[Mal91, SreOl, BP01, Tom86]. 

Unfortunately, when K > 1 finding the maximum weight K -hyperforest is NP-complete, and finding 
good approximation algorithms remains an open problem. The common heuristic approach is a Prim-like 
greedy approach, maintaining & fully connected hypertree, and adding to it only single vertices [Mal91, 
BP01, BJ02]. Alternatively, one might consider a more flexible, and possibly more powerful, Kruskal- 
like greedy approach, adding hyperedges to a possibly unconnected if-hyperforest. In order to do so, it is 
necessary to ensure that a new hyperedge about to be added does not break the acyclicity. 

A particular situation where the Kruskal-like approach is necessary is when we would like to greedily 
augment an initial, possibly unconnected, hyperforest. This might be a required, or strongly desirable, 
substructure, or a high-weight substructure found by global search techniques (it is possible to efficiently 
find hyperforests containing at least a constant fraction of the optimal weight [KS01]). 

Acyclicity is also important in order to preclude possible conflicts in, e.g. relational databases [BFMY83], 
or when learning graphical models [Bes74]. In such applications, one might want to ensure that new rela- 
tions added, e.g. to a database scheme, do not break its acyclicity. 

When augmenting forests, Tarjan's Union-Find dynamic data structure enables checking efficiently if a 
new edge breaks the acyclicity, by keeping track of the connected components in the graph. The main result 
in this paper is a dynamic data structure that serves a purpose analogous to Tarjan's Union-Find structure 
in hyperforests: for any candidate hyperedge, the data structure enables verifying that adding the hyperedge 
will not break the acyclicity of the augmented hyperforest. The amortized time per operation is almost 
independent of the hypergraph size (dependent through the inverse of Ackarman's function). 

We show how in hyperforests, it is no longer enough to consider a single type of connectedness. Thus, 
the simple notion of connected components, which can be captured using a single Union-Find structure, is 
not enough. Instead, we present a novel view of hyperforests, at different levels, each highlighting a different 
degree of connectivity, and use a separate Union-Find structure for each level. 

On the way to developing such a data structure, we also suggest the notion of a hypercycle. Although 
acyclic hypergraphs have been studied for the past three decades, using many equivalent characterizations, 
we are not aware of any characterization that directly defines a hypercycle and characterizes acyclic hyper- 
graphs as those that to do not have hypercycles. Such a characterization, which we give in Definition 9 and 
Theorem 12, provides added insight into hyperforests. 

The rest of this paper is organized as follows: in Section 2 we define hyperforests and specify the 
desired data structure. In Section 3 we examine the concepts of hyperconnectivity and hypercycles and lay 
the foundations for the proof of the data structure, which is presented in Section 4. Finally, in Section 5 we 
discuss the problem of finding maximum weight hypertrees and the utility of Kruskal-like greedy approaches 
over Prim-like approaches. 

2 Hypergraph Acyclicity 

Preliminaries A hypergraph H(V) is a collection of subsets, or hyperedges, of the vertex set V: H(V) C 
2 V . If h! C h 6 H then the hyperedge h' is covered by H. Of particular interest are the maximal hyperedges 
of a hypergraph H, which are not covered by any other hyperedges in H — in fact, in this paper we refer to H 
as containing only such maximal hyperedges, while denoting by H the collection of all covered hyperedges: 
H = {h C V\^h'&Hh C h'}. We say that a hypergraph Hi covers H2 if i?2 C H\. 
The projection of a hypergraph H onto a set of vertices s is H s = {hn s\h € H }. 



Several equivalent definitions of hypergraph acyclicity are in common use (see [SreOO] for a review). 
Here, we define acyclicity using the notion of a tree structure: 

Definition 1. A hypergraph H is said to have a tree structure T(H) iffT is a tree over all the hyperedges of 
H and the following path overlap property holds: If (hi, ti2, ■ ■ ■ , h m ) is a path of H -hyperedges in T, then 
Vi<j< m h\ fl h m C hi. 

Definition 2. A hypergraph is acyclic iff it has a tree structure. An acyclic hypergraph is also referred to 
as a hyperforest. We say that a hyperforest has width (at most) K, and refer to it as a K-hyperforest, if its 
hyperedges are of size at most K + 1. 

Problem Statement: Checking Acyclicity We seek a data structure that will allow us to augment a 
hyperforest by adding hyperedges to it, ensuring that it remains acyclic. That is, the data structure should 
keep track of the "current" hyperforest H and support two operations, where h new is a hyperedge: 
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QuERY(/i new ) returns True iff H U {h new } is acyclic. 
lNSERT(/i new ) augments H i- H U {h new }, assuming that it i 



ig that it is acyclic. 



Bounded Tree- Width Hypergraphs Unlike forests, hyperforests do not form a monotone family of hy- 
pergraphs: a sub-hypergraph of an acyclic hypergraph might be cyclic, and conversely, adding hyperedges 
to a cyclic hypergraph might make it acyclic. When a tree structure is used in order to perform efficient 
computation using dynamic programming (e.g. inference in graphical models), it is often admissible to add 
extra hyperedges to a cyclic hypergraph in order to obtain a covering hyperforest. Computation is then per- 
formed using the tree structure of this covering hyperforest. The important requirement is that the width of 
the covering hyperforest be small, as computation is exponential in this width: 

Definition 3. The tree- width of a hypergraph H is the minimum width of a hyperforest that covers H. 

Accordingly, in such situations, one might want a have data structure that checks whether adding a 
hyperedge maintains low tree-width. If all hyperedges added are of size at most K + 1, then the data 
structure presented here ensures a tree-width of not more than K. The converse is not true: a hyperedge 
h new might be refused even though H U {h new } has tree- width at most K. 

Before considering dynamic data structures for maintaining low tree-width, it is important to remember 
that calculating the tree-width statically, or equivalently finding a narrow triangulation, is by itself a very 
difficult task. Although linear time algorithms for constant width have recently been discovered [Bod96], the 
dependence on the width is extremely prohibitive and these algorithms are not usable in practice. Instead, 
various heuristics, approximation algorithms, and super-polynomial-time algorithms are used [SG97]. 

Augmentation in Greedy Algorithms Our main motivation for the dynamic data structure stems from 
greedy Kruskal-like construction of high-weight hyperforests, particularly in order to find maximum like- 
lihood Markov networks of bounded tree-width. If the weights on hyperedges are all non-negative, it may 
be appropriate to allow bounded tree-width hypergraphs in intermediate stages, requiring a dynamic data 
structure that checks tree-width rather then acyclicity. However, in situations in which weights might be 
negative or positive, as in the case when the weight of a hypergraph corresponds to its likelihood [SreOl], 
we cannot allow intermediate cyclic hypergraphs, as making them acyclic might introduce high negative 
weights (the weight of a cyclic hypergraph does not correspond to its likelihood). For such applications, the 
acyclicity is the correct property to require (along with ensuring each added hyperedge is of proper size). 



3 A new look at hyperforests 

3.1 Hyperconnectivity 

Connectivity in hyperforests is more substantially complex than connectivity in forests. In forests, two edges 
are either incident or disjoint. In a K -hyperforest, there are K + 1 "degrees" at which two hyperedges can 
overlap, corresponding to overlap sizes ranging from to K. We suggest a hierarchical decomposition of a 
hyperforest into superedges, which allows us to concentrate on one degree of connectivity at a time. 

Definition 4 (Superedge). Two hyperedges hi and h m are k-connected l if there exists a sequence of hy- 
peredges hi, . . . , h m such that \hi D /ij+i| > k for all 1 < i < m — 1. A Ai-superedge q of a hyperforest H 
is a maximal (k + l)-connected subset of H. 

Denote the set of all vertices in the union of all the hyperedges in a superedge q as q = (Jq. 



Definition 5 (Overlap of superedges). The overlap between two superedges qi and q2 of H is s 
qi and #2 are said to overlap simply if s = hi D /12/or some hi 6 qi, /12 £ 12- 
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Figure 2 shows a hierarchical decomposition of a 3-hyperforest into superedges. Each {k — 1) -superedge 
contains a tree structure over the fc-superedges that is a minor of the tree structure over all the hyperedges. In 
a if -hyperforest H, the if-superedges are the hyperedges, the O-superedges are the connected components, 
and the single (—1) -superedge is the entire hyperforest H. 

We can now look at a if-hyperforest one level at a time. At level k, we consider only k -connectivity 
by focusing on the fc-superedges in a (A; — l)-superedge. (< A;) -connectivity does not exist in a (k — 1)- 
superedge 2 , and (> A;) -connectivity is abstracted into the A>superedges. As a result, the tree structure over 
the fc-superedges in a (A; — l)-superedge has the desirable property that all overlaps between adjacent k- 
superedges have the same size. 
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Figure 2: An example of the hierarchical decomposition of superedges. (a) depicts a hypergraph H, and (b) 
shows the tree structures of H at different levels. 




Lemma 6. The path in a tree structure T between two hyperedges in the same superedge q contains only 
hyperedges from q (i.e. the hyperedges of a superedge appear contiguously in T). 



Note that this is not the usual defi nition of fc-connected. 



In this way, a /c-superedge q 'functions" as a fe-hyperedge because all overlaps between q and any hyperedge not in q have size 



at most k. 



Proof. Since any overlap along the path in T between two hyperedges is a separator between them, and the 
two hyperedges are (k + 1) -connected, all overlaps must be of size at least k + 1, so all hyperedges along 
the path are (k + reconnected. □ 

Let us now show that a hierarchical decomposition such as the one in Figure 2 exists for all hyperforests. 
We do this by constructing, for each (k — l)-superedge p, a tree structure over the fc-superedges of p. 

Definition 7. If p is a (k — \)-superedge, a tree structure Tq over a set of k-superedges Q p in p is a tree 
such that the following hold: 

1. Any two k-superedges in Q p overlap simply. 

2. For all paths qi,...,q m in Tq, VI < i < m, q\ fl q~ m C q~i. 

Theorem 8. A hypergraph H is a hyperforest iff for all k, for all (k — \)-superedges p, there exists a tree 
structure over the set Q p of k-superedges of p. 

Proof. 

=>: From the tree structure T# of H, we will construct a tree structure Tq over Q p , which is a minor 
of Th- Include (gi,^) £ Tq iff there is some edge (/ii,/i2) G Th with h\ E qi and /12 E <?2- Call h\ and 
/i2 the gateway hyperedges of the edge (gi, 52) E Tq. Now we have to show that Tq is a valid tree structure. 

To verify that Tq is actually a tree, we will show that a cycle q , qi, . . . , q m in Tq means there is a cycle 
in Th- 3 Let hi-\ E qi-i and g^ E qi be the two gateway hyperedges of {qi-i, qi). By Lemma 6, the path 
from gi to hi in Th contains only hyperedges in qi. Consider the sequence c = ho, gi, . . . , hi, g2, ■ ■ ■ , h2, ■ ■ ■ , g m i ■ ■ ■ h m . 
In any case, there are at least m > 3 distinct hyperedges, so c is a cycle in Th- 

To verify that all superedges overlap simply, we will show that the overlap between qi,q m E Q is 
contained in gateway hyperedges of qi and q m . Let qi, . . . , q m be the path in Tq, and hi E qi and h m E q m 
be gateway hyperedges of (qi, #2) and (q m -i, q m ), respectively. Both hi and h m must be on the path in Th 
from any h\ E qi to any h' m E q m . Because Th is a tree structure, h[ r\h' m C hi r\h m , so q"i r\q m C hi r\h m . 

To verify the path overlap property, we invoke the path overlap property of Th and notice that the 
overlaps along the path between two A;-superedges qi,q m E Q are included in the overlaps along a corre- 
sponding path in Th- Let s be the overlap between qi and q m (s — hi PI h m for some hi E q\, h m E q m ). 
The path from hi to h m in Th contains the gateway hyperedges of each (q%,q%+i). Since s is contained in 
every hyperedge in the path hi, . . . , h m (in particular, the gateway hyperedges), s is also contained in every 
superedge in the path qi,...,q m . 

<=: To show that H is a hyperforest, we will construct a tree structure Th over H. To do this, we 
assume by induction that there is a tree structure T q over each {k + l)-superedge T q of H and then show 
that there is a tree structure T p over each A;-superedge p of H. When k = — 1, p = H is the single (— 1)- 
superedge, and we have the desired tree structure Th — T p . 

Base case (A; = K): Each fc-superedge is a single hyperedge and has a trivial tree structure. 

Inductive case (k < K): Assume that there is a tree structure T q over each (A; + 1) -superedge q of H. 
Fix any A>superedge p. p is partitioned into a set Q of (k + 1) -superedges, each of which has a tree 
structure by the induction hypothesis. Furthermore, there is a tree structure Tq over Q. We now 
construct T p . T p includes U qe QT q plus an edge (h qi , h q2 ) for each (gi, #2) £ Tq, where h qi E qi and 
h q2 E #2 are chosen to be any hyperedges containing qi n q\. It is clear that T p is a valid tree structure. 

□ 



3 Recall that a (simple) cycle is a closed path of at least three vertices, all distinct. To clarify notation, if we say qo,qi, . . . ,q m 
is a cycle, q — q m and m > 3. 



3.2 Hypercycles 

A tree structure proves the acyclicity of a hypergraph, whereas a hypercycle proves the non-acyclicity of a 
hypergraph. 

Definition 9. A k-hypercycle is one of the following: 



1. Two k-superedges that overlap non-simply. Call this a hyperdoublet. 

2. A sequence c = qi, q2, ■ ■ ■ , q m (m > 3) of distinct k-superedges with distinct overlaps Sj 
of size exactly k, such that there exists some Si that does not contain s* = gi fl q m . If\s* 
a regular hypercycle. Otherwise, call c a hyperloop. 
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A regular 1-hypercycle is exactly a simple cycle: overlaps between edges of the cycle are the distinct 
vertices along it, while the overlap s* is the vertex between the "first" and "last" edge. In higher order 
cycles, we cannot always require that the overlap s* that "closes" the cycle is also of size exactly k, e.g. (c) 
in Figure 3. However, it is not enough to require that the overlap s* be non-empty (e.g. (d) in the Figure): 
to be a cycle a path much "touch" itself "outside" of the common overlap in the path. 
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(d) not a hypercycle 
(b) hyperdoublets (c) a hyperloop 

Figure 3: Examples of hypercycles. 

Before introducing the main theorem of this section, we state the following lemma which facilitates 
discussing hypergraphs without hypercycles: 

Definition 10. We say that a sequence of overlaps s\, . . . , s m is block-distinct if all repeated overlaps occur 
next to each other (if Si = Sj, then Vi < k < j, Sj = s^ = Sj). 

Lemma 11. A hypergraph has no k-hypercycles iff both of the following conditions are true: 

1. All k-superedges overlap simply. 

2. For all sequences qi,q2, ■ ■ ■ ,q m ( m •> 3 J of block-distinct k-superedges with distinct overlaps Si = 
q~i fl ^j-i-i of size exactly k, s* = q~\ fl q m is contained in every s,. 

Proof. The difference between this lemma and the definition of a hypercycle is that in condition 2, we 
require block-distinct, whereas in part 2 of Definition 9, we say distinct. Before we show that distinct and 
block-distinct are interchangeable, we note that in condition 2 of Lemma 11, Vi, s* C Si can be equivalently 
replaced by 3z, s* C %: suppose that s* C qi for some i. Then s* C qi fl qi and s* C qi fl q m . By induction 
on the two halves of q\, . . . , q m , s* C qiii. 



Now, for each sequence c—qi,...,q m with block-distinct overlaps, we construct a subsequence c' with 
distinct overlaps by replacing each maximal contiguous subsequence q^ . . . , qj in c with identical overlaps 
along qt,...,qj with just q^ and qj. The overlaps along d are distinct and also contain a*. □ 

Requiring lack of hypercycles resembles the definition of a tree structure over superedges (Definition 
7). The main difference between the two conditions is that the path overlap property requires agreeing on 
a global object, namely the tree structure, and only paths along along the tree structure must have the path 
overlap property. The independence from a specific tree structure, which will be crucial later in proving 
the correctness of the data structure, is achieved by requiring a uniform degree of connectivity between 
superedges. 

Theorem 12. H is a hyperforest iff H contains no hypercycle. 

Proof. 

=$■: Let Th be any tree structure of H. We will verify the two conditions in Lemma 11. 

1. To show that any two A;-superedges g l5 q 2 overlap simply, let k' be the largest value such that qi and 
q 2 are in the same (A;' — l)-superedge p. Let r\ be the A:'-superedge that contains q\, and r 2 ^ t\ be 
the A;'-superedge that contains q 2 . By Theorem 8, T\ and r 2 overlap simply, so q\ and q 2 also overlap 
simply. 

2. Let c = ?i,...,9 ra bea sequence of distinct fc-superedges with distinct overlaps a% of size k. We 
want to show that for all a, s* C s,. Consider the path d in Tq which is the concatination of the paths 
in Tq between every <&, <fc+i: d = q\, . . . , q 2 , . . . , q m . Since all overlaps Sj are of size k, all adjacent 
A;-superedges in d in the sub-path <&,..., q%+i have the same overlap Sj. Label each edge (q, r) along 
the path d in Tq, with the overlap qCir. Then, for i ^ j, the set of edges from qj to qt+i and the set 
of edges from qj to gj+i are disjoint because their labels are different (a, / aj). Of course, for all i, 
the edges between <& and g, + i are distinct because they form a simple path. Therefore, all edges in the 
path d are distinct, and d is a simple path. Moreover, Tq is a tree structure, so VI < i < m, s* C <&. 

<= : For a hypergraph H with no hypercycles, we will show that H is a hyperforest by constructing a 
tree structure T p over the set of fc-superedges Q p of each (k — l)-superedge p (Theorem 8). For each covered 
hyperedge (subset of a hyperedge) s of size exactly k, consider the A;-superedges qi,q2,---,Qm^Qp that 
contain it. Note that since there are (k — 1) -superedges in a fc-superedge, their overlap is of size exactly k, 
and is therefore exactly s. Choose one of these, say q\, arbitrarily and connect all remaining q 2l ■ ■ ■ , q m to 
q x inTp (i.e. Vi<j< m (gi,%) e T p ). 

To show that T p is a tree, label each edge in T p with its corresponding overlap. By construction of 
T p , for each label, there is a single A;-superedge {q\ in the above notation) that is incident to all edges so 
labeled. Therefore, every simple path must contain at most two edges of the same label, and if there are 
two such edges in the path, they must be adjacent. Suppose for contradiction that there is a simple cycle 
c — QOi Q\i ■ ■ ■ i Qm in T p with S{ = qi PI %+i for < i < m, such that ao / s\. q\, . . . , q m is a sequence of 
distinct A;-superedges with block-distinct overlaps. Due to condition 2 of Lemma 11, ao C si. But this is a 
contradiction since both overlaps are of size k. Thus, c could not have existed, and T p is indeed a tree. 

The simplicity of the overlaps and the path overlap property in T p now follow immediately from the 
definition of a hypercycle. 

□ 



Query (/i new ): 

1 compute H hnev/ and Z k 




Insert (/i new ): 


2 for k <— K downto 1 do 


1 assert Query (/i new ) 


3 for s, £ 6 Sh do 


2 for k <r- 1 to if do 


4 if [4 (s, i) and not Z k (s, t) then 


3 for s,t E Sh do 


5 return FALSE 


4 union (s, t) in J7& 


6 return True 







Figure 4: Pseudocode for Query and Insert. S k denotes the set of &-supervertices in i?/j new . 

4 The Data Structure 

We use ordinary Union-Find structures at each of the K levels. At level k, a Union-Find structure U k keeps 
track of disjoint sets of connected k-supervertices. 

Definition 13 (Supervertex). A fc-supervertex (same as a covered (k — l)-hyperedge) of a hypergraph H 
is a k-subset of some hyperedge h E H. Two k-supervertices s\ and S2 are k-connected (or just connected) 
if there exists a (k — l)-superedge q of H such that s\, S2 C q. 

4.1 Overview of the data structure state 

We maintain the following two values with respect to the current hyperforest H: 

• The set H of all covered hyperedges in H, equivalent to the supervertices of H. 

• The transitive relation U k for each k < K, where Uk(s, t) specifies whether the two fc-supervertices 
s and t are connected. 

In addition, in each Query operation, we compute two additional values that are dependent on h new . 
These two values are projections of H and U k onto h new . 



• The set -ff/j new , which is the projection of H onto h new . 

• The relation Z k for each k < K, where Z k (s,t) specifies whether two A;-supervertices s and t are 
connected in H hnevi . 

4.2 The Query and Insert operations 

Query (/i new ) returns True iff for some k, there exists two A;-supervertices s and t such that s and t are 
connected in H but not connected in iJ/j new . If QuERY(/i new ) returns True, lNSERT(/i new ) connects all 
A:-supervertices of /i n ew in U k . See Figure 4 for pseudocode. 

Roughly speaking, Query returns False when s and t are connected "outside" of h new . In this case, 
adding /i new would close a hypercycle. It is not enough to merely require that s and t are connected in H, 
since if they are also connected to the same extent inside h new , h new might "collapse" into the path between 
s and t. 



4.3 Correctness 

We now show that QuERY(/i new ) returns True iff H' = H U {h new } is a hyperforest 4 . Let W = 
H U {h new } be the supervertices of the augmented hyperforest. 

Theorem 14. If Insert (h new ) returns True, then H' is a hyperforest. 

Proof. From any tree structure Th of H, we will construct a tree structure Th 1 for H'. The idea is to break 
up Th into the subtrees Ti(Hi), . . . 7 T m (H m ) that h new separates, and then connect each subtree to h new 
via some hi G Tj. 

Specifically, to get T H >, remove each edge (h a , hb) in Th such that h a D hb C h aey/ . We claim that 
letting hi = argmax^g^. \h' n /i n ew| (the hyperedge that overlaps with h new the most) creates a valid tree 
structure. It is enough to verify that the path overlap property holds for paths in T H ' involving h new : both 
paths h a ,... , /inewi • • • ■> hb passing through h new and paths in which h new is an endpoint. 

Paths passing through h new connect h ai hb from different subtrees, and so the path from h a to hb in Th 
must have contained a removed edge, and h a C\ hb C h new . 

For paths where h new is an endpoint, we show the path overlap property by showing that the sequence of 
overlaps between h new and every hyperedge along the path from h a to h new in Th> increases telescopically. 
To do this, we must argue that (1) the overlaps form a subset relation, and that (2) there are no "local 
minimum" overlaps where both the preceding and following overlaps are proper supersets of an overlap. 
Formally, we verify two properties: 5 

1. For any (/i a , hb) € Th, if h a D hb <jt h new , then either h' a C h' b or h' b C h' a . Otherwise, there would 
be two A;-supervertices in h a and hb connected "outside of" H. 

2. For any path /« a , hb-, h c in Th such that h a C\hb <£ h new and hbC\h c (jL h new , then \h' b \ > minjl/i^l, \h' c \}. 
Otherwise, there would be two fc-supervertices in h a and h c connected "outside of" H. 

□ 

Theorem 15. 7fQuERY(/i new ) returns FALSE, then H' is not a hyperforest. 

Proof. We will show how to construct a hypercycle using the two A;-supervertices s, t that are A;-connected 
"outside" /i new - We will construct a hypercycle that includes /i new and the A:-connected path between s 
and t. Let q\, . . . , q m be the A;-connected path in the tree structure Tq over the A-superedges Q p of the 
(k — l)-superedge p containing s and t. 

Intuitively, h, qi, . . . ,q m "forms a hypercycle," but h new may collapse any of the qi's into one k- 
superedge. So we extract out a subpath of qi, . . . , q m such that h new collapses at most the terminals qi,q m . 

For each i, let ui be a maximum overlap between h new and a hyperedge in the A;-supervertex qi along 
the path. We also require \ui\ > k; otherwise, we say that Ui does not exist. We choose u\ and u m so that 
ui D s and u m D t. See Figure 5 for an illustration. 

Extracting the subpath proceeds in two steps: 

1. Choose a subpath for which all overlaps between adjacent A;-superedges on this subpath are not con- 
tained in /inew This is possible because s and t are not A:-connected in i?/( new . 

2. Choose a subpath q^...,qj from the resulting subpath of the first step such that the u/s do not exist 
for i < r < j, but u%, Uj do exist. 



4 Due to space limitations we provide only proof outlines — see [LS03] for full proofs. 

5 For notational convenience, denote the overlap of hnew and a hyperedge as the hyperedge with an added prime. For instance, 



h' a = h a Cl h n 




Figure 5: Construction of a hypercycle in Theorem 15. 

Now, consider the A;-superedges c' = q[ , . . . , q[ in H' after h new optionally collapses the terminal su- 
peredges. All overlaps in this sequence are block-distinct and of size k. If /' > 3, c' is a regular hypercycle. 
If /' = 2, c' is a hyperdoublet. If /' = 1, we have to delve deeper and look at the A:'-superedges of q[ for the 
smallest k 1 > k. There are at least two A;'-superedges that result. If there are two, we have a hyperdoublet. 
Otherwise, we have a hyperloop around the overlap of size k. 

Therefore, H' is not a hyperforest. □ 

4.4 Time complexity 



In Query, we compute H^ new by iterating through all subsets of h new and selecting the ones that are in H. 
We compute Z k by iterating through all b! €1 -ffft new and unifying all pairs of A;-subsets of b! . There are less 
than 4l ftnew l pairs of A;-subsets in h new , and so computing Z k requires no more than 0(K4 K ) time. 

Each Uk can be implemented as an ordinary Union-Find structure. The number of supervertices stored 
in these Union-Find structures is bounded by the maximal number of covered hyperedges in a hyperforest of 
width K, which is less than |F|2^ +1 . The amortized time of each call to Find is then 0(a(|F|2^), where 
a is the inverse Ackermann function. Each Query operations calls Find once for each pair of A;-subsets in 
h new , yielding a combined amortized run-time of 0{A K (K + a(\V\2 K ))) per Query operation. 

Each Insert operation calls Union a similar number of times, its amortized run-time is also 0{A K {K-\- 
a{\V\2«))). 

5 Experiments with Greedy Hypertrees 

Given as input weights on candidate hyperedges, the weight of a hyperforest is equal to the sum of the 
weights of all hyperedges it covers. In the if-maximum hypertree problem, we would like to find the K- 
hyperforest of maximum weight. When K > 1 the problem is NP-hard [SreOO]. Figure 6 provides an 
example where greedy approaches perform suboptimally. 




Figure 6: An example hypergraph on which a greedy algorithm would capture asymptotically none of the 
weight of the optimal hypertree. 

The common greedy heuristic for constructing a high-weight hyperforest is Prim-like: we start with the 
highest-weight hyperedge, and at each iteration consider only candidate hyperedges of the form s U {v}, 
where s C h € H is a subset of size exactly K of a hyperedge of the "current" hyperforest, and v is a new 
vertex not yet in the hyperforest. The heaviest 6 such hyperedge is added to the hyperforest, which remains 

6 If weights are specifi ed also for non-maximal hyperedges, then when considering the weight of a hyperedge, the weights of all 
its sub-hyperedges not already covered by H are added to it. 
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fully if -connected. 

Using the data structure described in this paper, one can consider a more flexible Kruskal-like greedy 
procedure: at each iteration, all hyperedges which do not cause hypercycles are considered, and the heaviest 
one is added to the hyperforest. 

To demonstrate the possible utility of a Kruskal-like greedy procedure, as compared to a Prim-like 
greedy procedure, we generated random weights on all candidate 2-hyperedges in a hypergraph with 100 
vertices in the following way: we first constructed a random "planted" 2-hypertree by augmenting a hy- 
perforest randomly. Hyperedges outside the planted hypertree were assigned a random weight uniformly 
distributed between and 1. In one set of experiments, weights inside the hypertree were assigned random 
weights uniformly distributed between and 10. In the second set, the weights were chosen uniformly 
between and 1 with probability 1/2, and between and 20 with probability 1/2. We generated 10 ran- 
dom weight-sets of each type, and tried both greedy approaches on each graph. Table 1 summarizes the 
weights of the resulting hypertrees. Kruskal performed significantly better on both sets of experiments, and 
especially when the weight was less evenly distributed in the "planted' hypertree. 





U[0, 10] 


±U[0,1] + ±U[0,20] 


Planted 

Prim-like 

Kruskal-like 


0.590 ± 0.0339 
0.506 ±0.0816 
0.587 ±0.0342 


0.609 ± 0.059 
0.323 ±0.107 
0.619 ±0.058 



Table 1: Averages and standard deviations of fraction of the weight captured by the hypertrees: the planted 
hypertree, the hypertree recovered with a Prim-like greedy approach, and the one recovered with a Kruskal- 
like greedy approach. 



6 Conclusion 

We have presented a dynamic data structure for keeping track of acyclicity in hypergraphs, allowing aug- 
menting a hyperforest while ensuring new hyperedges to not break its acyclicity. Each operation takes time 
which is almost constant in the size of the hyperforest but, like most hyperforest algorithms, is exponential 
in the tree-width. Although an exponential dependence is probably unavoidable, it might well be possible 
to reduce the precise dependence. 

The new dynamic data structure allows efficient implementation of Kruskal-like greedy heuristics for 
finding high-weight hyperforests that have some advantages over Prim-like heuristics. However, the im- 
portant problem of constructing efficient algorithms that approximate well the maximum-weight hypertree 
remains open. 
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